{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Engaging Image Input/Output with OpenAI's Responses API in AG2\n",
    "\n",
    "**Author:** Yixuan Zhai\n",
    "\n",
    "This notebook demonstrates how to do image input and image generate through a two-agent chat with OpenAI's Responses API and their gpt-4.1 model.\n",
    "\n",
    "**Note: Current support for the OpenAI Responses API is limited to `initiate_chat` with a two-agent chat. Future releases will included expanded support for group chat and the `run` interfaces.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an example, using image inputs with the OpenAI Responses model provider in AG2 we can generate stylized versions of the images:\n",
    "\n",
    "![Style 1](https://media.githubusercontent.com/media/ag2ai/ag2/refs/heads/main/notebook/openai_responses_style_1.png)\n",
    "\n",
    "![Style 2](https://media.githubusercontent.com/media/ag2ai/ag2/refs/heads/main/notebook/openai_responses_style_2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set LLM config to use OpenAI response API\n",
    "\n",
    "For image generation, we need to add the built in tool `image_generation`.\n",
    "\n",
    "Visit the [`OpenAI Responses API Documentation`](https://platform.openai.com/docs/api-reference/responses) for more information.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install AG2 and dependencies\n",
    "\n",
    "To be able to run this notebook, you will need to install AG2 with the `openai` extra.\n",
    "````{=mdx}\n",
    ":::info Requirements\n",
    "Install `ag2` with 'openai' extra:\n",
    "```bash\n",
    "pip install ag2[openai]\n",
    "```\n",
    "For more information, please refer to the [installation guide](https://docs.ag2.ai/latest/docs/user-guide/basic-concepts/installing-ag2).\n",
    ":::\n",
    "````"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "import os\n",
    "import textwrap\n",
    "\n",
    "from autogen import AssistantAgent\n",
    "\n",
    "# LLM config\n",
    "llm_cfg = {\n",
    "    \"config_list\": [\n",
    "        {\n",
    "            \"api_type\": \"responses\",  # use 'responses' for OpenAI Responses API\n",
    "            \"model\": \"gpt-4.1\",  # supports vision + images\n",
    "            \"api_key\": os.getenv(\"OPENAI_API_KEY\"),\n",
    "            \"built_in_tools\": [\"image_generation\"],\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an assistant for image processing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assistant = AssistantAgent(\n",
    "    name=\"ArtBot\",\n",
    "    llm_config=llm_cfg,\n",
    "    system_message=textwrap.dedent(\"\"\"\n",
    "        You are an assistant that can reason over images and\n",
    "        use the built-in image_generation tool. When generating\n",
    "        an image, return ONLY the tool call result you receive.\n",
    "    \"\"\").strip(),\n",
    ")\n",
    "\n",
    "#  initial image (URL or data-URI)\n",
    "IMAGE_URL = \"https://upload.wikimedia.org/wikipedia/commons/3/3b/BlkStdSchnauzer2.jpg\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = assistant.run(message=f\"Describe this image <{IMAGE_URL}> in one sentence\", user_input=True)\n",
    "\n",
    "response.process()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use formal image input to reduce hallucination\n",
    "\n",
    "Sometimes, using image links as a part of natural language will cause hallucinations in the follow-up questions.\n",
    "\n",
    "The OpenAI Response API provides a formal way for image input, visit the [`Image Input`](https://platform.openai.com/docs/guides/images-vision?api-mode=responses) for more information.\n",
    "\n",
    "In this example, ask the assistant to generate different variations of the image:\n",
    "\n",
    "- \"Give me a Ghibli-style version of the image\"\n",
    "- \"Give me a version of the image in a Matrix style\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initialize chat with image input\n",
    "chat = {\n",
    "    \"role\": \"user\",\n",
    "    \"content\": [\n",
    "        {\"type\": \"input_text\", \"text\": \"Describe this image in one sentence.\"},\n",
    "        {\"type\": \"input_image\", \"image_url\": IMAGE_URL},\n",
    "    ],\n",
    "}\n",
    "\n",
    "response = assistant.run(message=chat, user_input=True)\n",
    "\n",
    "response.process()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save generated images\n",
    "\n",
    "Run the following cell to save the generated images from the previous conversation.\n",
    "\n",
    "An image will be saved for each message in the chat history that had an image generated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ----helper function to save image from base64 string----\n",
    "def save_b64_png(b64_str, fname=\"generated.png\"):\n",
    "    with open(fname, \"wb\") as f:\n",
    "        f.write(base64.b64decode(b64_str))\n",
    "    print(f\"image saved → {fname}\")\n",
    "\n",
    "\n",
    "messages = response.messages\n",
    "for i in range(len(messages)):\n",
    "    print(i)\n",
    "    message = messages[i]\n",
    "    # print(message)\n",
    "    if message.get(\"name\") == \"ArtBot\":\n",
    "        contents = message.get(\"content\", [])\n",
    "        for content in contents:\n",
    "            if (\n",
    "                content.get(\"type\") == \"tool_call\"\n",
    "                and content.get(\"name\") == \"image_generation\"\n",
    "                and \"content\" in content\n",
    "                and content[\"content\"]\n",
    "            ):\n",
    "                print(\"Saving image!\")\n",
    "                save_b64_png(content[\"content\"], f\"image{i}.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Image costs\n",
    "\n",
    "Image costs are not provided by OpenAI's API, instead they need to be calculated. In AG2 this is done automatically and is included in the chat result's `cost` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"The cost of the conversation, including image generation, is: {response.cost}\")"
   ]
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "This notebook demonstrates how to do image input and image generate through a two agent chat with OpenAI Responses API and gpt-4o.",
   "tags": [
    "multimodal",
    "responses"
   ]
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
